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Abstract: Ordinary differential equations (ODE's) are widespread models 
in physics, chemistry and biology. In particular, this mathematical formal- 
ism is used for describing the evolution of complex systems and it might 
consist of high-dimensional sets of coupled nonlinear differential equations. 
In this setting, we propose a general method for estimating the parame- 
ters indexing ODE's from times series. Our method is able to alleviate the 
computational difficulties encountered by the classical parametric methods. 
These difficulties are due to the implicit definition of the model. We propose 
the use of a nonparametric estimator of regression functions as a first-step 
in the construction of an M-estimator, and we show the consistency of the 
derived estimator under general conditions. In the case of spline estimators, 
we prove asymptotic normality, and that the rate of convergence is the usual 
Y^-rate for parametric estimators. Some perspectives of refinements of this 
new family of parametric estimators are given. 
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1. Introduction 

Ordinary differential equations are used for tlie modelling of dynamic processes 
in physics, engineering, chemistry, biology, and so forth. In particular, such a 
formalism is used for the description of ecosystems (for example competing 
species in biology), or of cell regulatory systems such as signaling pathways 
and gene regulatory networks [12]. Usually, the model for the state variables 
X = (xi, . . . , Xd)^ consists of an initial value problem 

jx{t) - F{t,x{t),e), Vt G [0,1], 
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where _F is a timc-dcpcndent vector field from M'^ to M'*, d e N, and 6* G 8, 
Q being a subset of a Euclidean space. When data are available such as a 
time series, we are interested in the problem of estimation of the coefficients 
parametrizing the ODE. In principle, this may be done by some classical para- 
metric estimators, usually the least squares estimator [28] or the Maximum 
Likelihood estimator (MLE). Different estimators have been derived in order 
to take into account some particular features of the differential equation such 
as special boundary values (there exists a function g linking the values at the 
boundary i.e. g{x(fi),x{l)) = instead of the simple initial value problem), or 
random initial values or random parameters [9]. Otherwise, there may be some 
variations on the observational process such as noisy observation times that 
necessitate the introduction of appropriate minimization criteria [26]. 

Despite their satisfactory theoretical properties, the efficiency of these esti- 
mators may be dramatically degraded in practice by computational problems 
that arise from the implicit and nonlinear definition of the model. Indeed, these 
estimators give rise to nonlinear optimization problems that necessitate the ap- 
proximation of the solution of the ODE and the exploration of the (usually 
high-dimensional) parameter space. Hence, we have to face possibly numerous 
local optima and a huge computation time. Instead of considering the estimation 
of straightforwardly as a parametric problem, it may be useful to look at it as 
the estimation of a univariate regression function t x{t) that belongs to the 
(finite dimensional) family of functions satisfying (l.f). Alternative approaches 
to MLE has been used in applications, such as two-step methods: in a first step, 
a proxy for the solution of the ODE is obtained by nonparametric methods, and 
in a second step, estimates of the ODE parameter are derived by minimizing a 
given distance. Varah [40] initiated this approach by using and differentiating 
a smooth approximation of the solution based on least-squares splines as [38] , 
or Madar et al. [29] with cubic splines (and a well-chosen sequence of knots). 
In the same spirit, approximation of the solutions of the ODE are provided by 
smoothing methods such as local polynomial regression [22, 30, 11], or neural 
networks [42] , within the same two-step estimation scheme. Slight modifications 
of these approaches are the adaptation of collocation methods to statistical 
estimation where the solution is approximated by Lagrange polynomials [20]. 
This two-step approach has also been considered in the functional data analy- 
sis (FDA) framework proposed Ramsay and Silverman [34], which is based on 
the transformation of data into functions with smoothing cubic splines. Ram- 
say proposed Principal differential Analysis (PDA) [32] for the estimation of 
linear ODE's. PDA has been recently extended to nonlinear ODE's and ame- 
liorated by repeating the two steps, introducing iterated PDA (iPDA) [31, 41]. 
Recently, the iPDA approach has then been extended to a so-called generalized 
smoothing approach by Ramsay et al., [33]. In this algorithm, the smoothing 
and the estimation of the parameter ODE are considered jointly, and Ramsay 
et al. proposed an estimation algorithm inspired from profile likelihood meth- 
ods. The paper provides a versatile estimator, as a lucid account of the current 
status and open questions concerning the statistical estimation of differential 
equations. We refer to [33] and the subsequent discussions for a broad view of 
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the difRcultics for parameter estimation of ODE's. The use of nonparametric es- 
timators is motivated by the computational simphcity of the estimation method, 
but an additional motivation is that the functional point of view enables one to 
use prior knowledge on the solutions of the ODE such as positivity or bound- 
edness whereas it is difficult to exploit the strictly parametric form. Indeed, it 
implies that we have a thorough knowledge of the influence of the parameters 
on the qualitative behavior of the solutions of (1.1), which is rarely the case. In 
this paper, we study the consistency and asymptotic properties of this general 
two-step estimator based on nonparametric estimators of the trajectories of the 
ODE. We focus in particular on the use of spline regression in the first step, as 
it is one of the estimator used previously in the construction of the proxy, but 
we discuss also the generality and potential ameliorations of this approach. 

In the next section, we introduce the statistical model and we define the 
so-called two-step estimator of 6. We show that under broad conditions this 
estimator is consistent, and we discuss some straightforward extensions of this 
estimator to different cases. In section 3, we derive an asymptotic representa- 
tion of the two-step estimator, and we focus on the case of the least squares 
splines. We discuss also the generality of this result with respect to other types 
of nonparametric estimators. In the last section, we give some simulation results 
obtained with the classical Lotka-Voltcrra's population model coming from bi- 
ology. 

2. Two-step estimator 
2.1. Statistical model 

We want to estimate the parameter 6 of the ordinary differential equation (1.1) 
from noisy observations at n points in [0, 1], < < • • • < t„ < 1, 

y.i = x{ti) + ei,i = l,...,n, (2.1) 

where the e^'s are i.i.d centered random variables. The ODE is indexed by a 
parameter G O C and initial value xq] the true parameter value is 9* and 
the corresponding solution of (1.1) is x*. 

The vector field defining the ODE is a function F : [0, 1] x A" x 6 ^ M'' 
{X C M'') of class w.r.t x for every 9 and with m > 1. It is a Lipschitz 
function so that we have existence and uniqueness of a solution xo^xq 

to (1.1) 

on a neighborhood of for each 9 and xq; and we assume that every corre- 
sponding solution can be defined on [0, 1]. Hence, the solutions xe.xo belong to 
C™+^([0, 1],R'^). Moreover, we suppose also that F is a smooth function in 9 so 
that each solution xg^xo depends smoothly^ on the parameters 9 and xq. Then, 
we suppose that F is of class in 9 for every x. Let /s be the density of the 



-"^if F depends smoothly on x and 8 then the solution depends on the parameter by the 
same order of smoothness, see Anosov & Arnold, Dynamical systems, p. 17. 
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noise e, then the log-hkehhood in the i.i.d case is 



and the model that we want to identify is parametrized by xp , S) € QxXx 
for instance when the noise is centered Gaussian with covariance matrix E {S~^ 
is the set of symmetric positive matrices). An alternative parametrization is 
{9,xe,xo:^) G 6 X X S'^ , with !F the set of functions that solve (1.1) for 
some 9 and xq, thanks to the injective mapping between initial conditions and 
a solution. 

In most applications, we are not really interested in the initial conditions 
but rather in the parameter 9, so that xq or xg^x^, can be viewed as a nuisance 
parameter like the covariance matrix S of the noise. We want to define esti- 
mators of the "true" parameters {x*,9*) {x* = xe-',x*) that will be denoted 
by (xn,9n)- The estimation problem appears as a standard parametric prob- 
lem that can be dealt with by the classical theory in order to provide good 
estimators (with good properties, e.g. y^-consistency) such as the Maximum 
Likelihood Estimator (MLE). Indeed, from the smoothness properties of F, the 
log-likelihood l{9,xo) is at least w.r.t {9,xo) so that we can define the score 

s{9,xn) = (fl^ ^^^)^- s{9,xo) is square integrable under the true prob- 
ability P[x*.e*)-: we can claim under weak conditions (e.g. theorem 5.39 [39]) 
that the MLE is an asymptotically efficient estimator. The difficulty of this ap- 
proach is then essentially practical because of the implicit dependence of x on 
the parameter {9,xo), which prohibits proper maximization of l{9,xo). Indeed, 
derivative-based methods like Newton- Raphson are not easy to handle then and 
evaluation of the likelihood necessitates the integration of the ODE, which be- 
comes a burden when we have to explore a huge parameter space. Moreover, 
the ODE's proposed for modelling may be expected to give a particular quali- 
tative behavior which can be easily interpreted in terms of systems theory, e.g. 
convergence to an equilibrium state or oscillations. Typically, these qualitative 
properties of ODE are hard to control and involve bifurcation analysis [25] and 
may necessitate a mathematical knowledge which is not always accessible for 
huge systems. Moreover, boundedness of the solution x* (a < x*{t) < b, with 
a,b G K'^) may be difficult to use during the estimation via the classical device of 
a constraint optimization. Hence, these remarks motivate us to consider the es- 
timation of an ODE as a functional estimation and use flexible methods coming 
from nonparametric regression from which we could derive a likely parameter 
for the ODE. 

2. 2. Principle 

We use consistent nonparametric estimators of the solution x* and its deriva- 
tive X* in order to derive a fitting criterion for the ODE and subsequently 
the M-estimator of 9* corresponding to the criterion. We denote by = 



n 




(2.2) 
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Uo < Q ^ oo, the L''{w) norm on the space of integrable 

functions on [0, 1] w.r.t. the measure w (and L'^ is the classical norm with re- 
spect to Lebesgue measure). We suppose that w is a continuous and positive 
function on [0, 1]. The principle of two-step estimators is motivated by the fact 
that it is rather easy to construct consistent estimators of x* and of its deriva- 
tive X* . The estimation of the regression function and its derivatives is a largely 
addressed subject and several types of estimators can be used such as smooth- 
ing splines (more generally smoothing in Reproducing Kernel Hilbcrt Spaces 
[43, 3]), kernel and local polynomial regression [13], series estimators [10]. So, 
one can construct a consistent estimator x„ of x* that can achieve the optimal 
rate of convergence with an appropriate choice of regularization parameters. 
One can derive also from x„ an estimator of the derivative x* by differenti- 
ating directly the smoothed curved x„ so that we have Xn = Xn- This simple 
device provides consistent estimator of the derivative for Nadaraya- Watson es- 
timators [17], spline regression [4")], smoothing splines [43], but other consistent 
estimators of the derivative can be obtained from local polynomial regression or 
wavelet estimator (e.g [()]). In this section, we consider then simply that we have 
two general nonparametric estimators x„ and i„, such that ||in ~ = op{l) 
and ||i„ — x*\\q = op(l). Hence, for the estimation of the parameter 9, we may 
choose as criterion to minimize the function Rn = II •'^n ^ F{t,Xm9)\\q^w 
from which we derive the two-step estimator 

9n - argmini?^,J0). (2.3) 

Thanks to the previous convergence results and under additional suitable con- 
ditions to be specified below, we can show that 

Rl^i9)^Kid)^\\^*~Fi;'^*,0)\\q,^ 

in probability, and that this discrepancy measure enables us to construct a 
consistent estimator 6'„. 

We are left with three choices of practical and theoretical importance: the 
choice of q, the choice of w and the choice of the nonparametric estimators. 
In this paper, we show the consistency in a general situation, but we provide 
a detailed analysis of the asymptotics of 0„ when g = 2, i„ is a regression 
spline (least-squares splines, with a number of knots depending on the number 
of observations n), and when i„ = Xn- In that case, the two-step estimation 
procedure is then computationally attractive as we have to deal with two (rela- 
tively simple) least-squares optimization problems (linear regression in the first 
step and nonlinear regression in the second step). As a consequence, this is a 
common approach in applications and we will put emphasis on the importance 
of the weight function w in asymptotics. Despite the limitation to regression 
splines, it is likely that our result can be extended to classical nonparametric 
estimators, as the ones listed above. 
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2. 3. Consistency 

We show that the minimization of Rnwi^) gives a consistent estimator for 6. 
We introduce the asymptotic criterion 

/ 1-1 ^ 1/9 

Rl{0)=[ / (F{-,x\e*)-F{-,x\6)fwdt 



\J0 

derived from i?^ ^ and wc make the additional assumption: 

Ve>0 inf Rl{e)>Rlie*). (2.4) 
||e-6i*||>£ 

which may be viewed as an identifiabihty criterion for the modcL 

Proposition 2.1. We suppose there exists a compact set K, <Z X such that 
y9 G 0,Vxo G X,\/t G [0,1], xg^xoit) is in IC, and w is a positive continuous 
function on [0,1]. Moreover we suppose that uniformly in {t,9) G [0,1] x Q, 
F(t^-^9) is K— Lipschitz on IC. If in and in are consistent, and in{t) G JC 
almost surely, then we have 

snp\RlJe)^Rl{e)\^op{l). 
flee 

Moreover, if the identifiabihty condition (2.4) is fulfilled the two-step estimator 
is consistent, i.e. 

On-e* ^Op{l). 

Proof. In order to show the convergence of \Rnw{^) ^ ~ 111-''" ^ 

F{-,imO)\\.u,^q - \\F{-,x*,9) - 6'*)||g|, we make the foUowing decomposi- 

tion 

\RIA(^)-Rim < II {in-F{;in,e))+{Fi;X*,e)-F{;X*,e*))\\,,^ 
< ||.i„ + \\F{;in,e)-F{;X*,0)\\,^^. 

(2.5) 

Since all the solutions X0_xa(t) and in(t) stay in IC d X, and x i-^ F{t,x,9) are 
K— Lipschitz uniformly in (t, 9), we obtain for all (<, 9) G [0, 1] x 9 

\\F{;in,9)-F{;X*,9)\\g^.^ < KM (^J^ \init) - X* (t)]'^ dtj =KM\\Xn-X*\\,. 

(2.6) 

where M is a upper bound for w. Together, (2.5) and (2.6) imply 

Snp\R%{9)-Rl,m < \\in-F{;X*,9*)\\^u.g 

flee 

+ sup ||F(-,x„,0)-F(-,x*, 0)11^,, 
flee 

< Mx \\in-F{x*,9*)\\q + KM X ||i„-a;*||,. 
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and consequently, by the consistency of and Xn, 

snp\Rl^ie)-Rli9)\ = opil). 

With the additional identifiability condition (2.4) for the vector field F, Theorem 
5.7 of [39] implies that the estimator 6n converges in probability to 6**. □ 

The minimization of a distance between a nonparametric estimator and a 
parametric family is a classical device for parametric estimation (Minimum Dis- 
tance Estimators, MDE). For instance, it has been used for density estimation 
with Hellinger distance [2] , or for regression with distance in the framework 
of regression model checking [24]. The difference between two-estimators and 
MDE's is that two-step estimators do not minimize directly the distance be- 
tween the regression function and a parametric model, but between the deriva- 
tive of the regression function and the parametric model. This slight difference 
causes some differences concerning the asymptotic behavior of the parametric 
estimators. Nevertheless, two-step estimators are closer to MDE's than the gen- 
eralized smoothing approach of Ramsay et al. [.ii] that uses also parametric and 
nonparametric estimators. These ingredients are used in a different manner, as 
the criterion maximized by the generalized smoothing estimator can be seen as 
the Lagrangian relaxation of a constrained optimization problem in a space of 
cubic splines (with given knots): the parametric and nonparametric estimators 
solve 

n 

minJZ ~ subject to \\g - F{-,g,6)\\l < e. 

The smoothing approach finds a curve that solves approximately a differential 
equation, which is then close to the spline approximation of the solution of 
the ODE (with parameter 6) computed by collocation [7]. Moreover, the op- 
timization problem is solved iteratively which implies that the nonparametric 
estimator is computed adaptively with respect to the parametric model and the 
data are repeatedly during estimation, whereas two-step estimators used the 
data once without exploiting the parametric model. 

Remark 2.1 (Local identifiability). The global identifiability condition (2.4) 
is difficult to check in practice and in all generality. In the case q = 2, we can 
focus on a simpler criterion by considering the Hessian J* of [9) at 6 = 0*: 

J* = I {D2F{t,x*{t),0*))'^ D2F{t,x*{t),e*)w{t)dt. 
Jo 

If J* is then nonsingular, the asymptotic criterion behaves like a positive definite 
quadratic form on a neighborhood V{9*) of 6*, so the criterion R^ separate 
the parameters. It is then critical to use a positive weight function w, or else 
some part of the path t i-^ x* {t) might be useless for the identification of the 
identification and discrimination of the parameter. At the opposite, w might be 
used to emphasize some differences between parameters. As it is shown in the 
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next section, the value of w at the boundaries has a dramatic influence on the 
asymptotics of the two-step estimator. 

Remark 2.2 (Random times and errors in times). If the observation times 
ii, . . . , i„ are reahzations of i.i.d. random variables (Ti, . . . , T„) with common 
c.d.f Q, the nonparametric estimators £„, as the one used before, are relevant 
candidates for the definition of the two-step estimator since they are still con- 
sistent under some additional assumptions on Q. 

As in the setting considered by Lalam and Klaassen ['2()], the observation 
times may be observed with some random errors = ti + rii,i ~ 1, . . . , n, (the 
77i's being some white noise) so we have to estimate x from the noisy bivariate 
measurements (rj, j/j). Consistent nonparametric estimators have been proposed 
for the so-called "errors-in-variables" regression and some examples are kernel 
estimators [14] and splines estimators [23] (in the sense). Hence, we can 
define exactly the same criterion function i?^ and derive a consistent parametric 
estimator. 



2.4- Partially observed ODE 

The estimator proposed can be easily extended to cases where several variables 
are not observed. Indeed, if the differential system (1.1) is partially linear in the 
following sense 

\u = G{u,v;t]) ,^ . 

V = H{u;ri) + Av ^ ' ' 

with X = {u^ )^ , u G M''i, being observed, v £ M.'^^ being unobserved, and 
di + d2 = d (the initial conditions are xq = {uj Vq)^), i.e. x{ti) is replaced by 
u{ti) in (2.1) (the noise being then di -dimensional). We want to estimate the 
parameter 9 = (77, A) when _ff is a nonlinear function and A is a matrix, so we can 
take advantage of the linearity in v in order to derive an estimator for v. We can 
derive a nonparametric estimator for v by using Un and the fact that t v(t) 
must be the solution of the non- homogeneous linear ODE v = Av + H{un]r]), 
u(0) = wq. The solution of this ODE is given by Duhamel's formula [19] 

Vt e [0, 1], ■!;„(*)= exp(iA)uo + / exp ((i - s)yl) i7(M„(s); 6')ds, (2.8) 

Jo 

which then can be plugged into the criterion R1{9). This estimator depends 
explicitly on the initial condition wq which must be estimated at the same time 

{0, Wo) arg min ^,-,{1^, A, wo) = u„ - F(ti„, v^] 77) 

(77,A,i>o) 

As previously, if H is uniformly Lipschitz the integral exp {{t — s)A) H{un] 9)ds 

converges (uniformly) in probability in the sense to exp {{t — s)A) H{u*; 6)ds 
as soon as ■«„ does, hence R'j^{9, A^vq) converges also uniformly to the asymp- 
totic criterion 

Rl{9,v^) ^Wii* ~ F{u\v*-9)\L,,- 
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The estimator {9, vq) is consistent as soon as R1,{9, uq) satisfies the identifiabihty 
criterion (2.4). 

3. Asymptotics of two-step estimators 

We give in this section an analysis of the asymptotics of two-step estimators. We 
focus on the least squares criterion R^^ ^, when the estimator of the derivative 
is i„. In that case, we show that the two-step estimator behaves as the sum of 
two linear functionals of a;„ of different nature: a smooth and a non-smooth one. 
Under broad conditions, we show that the two-step estimator is asymptotically 
normal and that we can derive the rate of convergence. In particular, we show 
that it is the case for the regression splines. A key result is that the use of a 
weight function w with boundary conditions w{0) = w{l) = makes the non- 
smooth part vanish, implying that the two-step estimator can have a parametric 
rate of convergence. In the contrary, the two-step estimator has a nonparametric 
rate. 

3.1. Asymptotic representation of two-step estimators 

Wc derive an asymptotic representation of 0„ by linearization techniques based 
on Taylor's expansion of i?^ ^ around 9* . This shows that the two-step estima- 
tor behaves as a simple linear functional of x„. We introduce the differentials 
of F at (x, 9) w.r.t. x and 9 and we denote them DiF{x, 9) and D2F{x, 9) re- 
spectively. For short, we adopt the notation D^F* , i = 1,2 for the functions 
t^^ D,F{x*{t),9*), i = 1,2, and D12F* lor t ^ DiD2F{x*{t),9*). 

Proposition 3.1. We suppose that D12 exists and w is differentiable. We in- 
troduce the two linear operators Fg ,„ and Ff, defined by 

TsMx) = [D2F*^[t)D^F*{t)w{t) - ^^{D2F*{t)w{t))^ x{t)dt 

and 

TbA^) = w{l)D2F*^ il)x{l) - wiO)D2F*^ iO)x{0) 

If DiF, D2F are Lipschitz in {x,9), J* is invertible, and Xn,Xn are (resp.) 
consistent estimators of x* and x* , then 

k~e* ^ (F,,„(a;„) - TsAx*) + rb,^(x„) - Fb,,„(x*)) + op(l). 

Proof. For sake of simplicity, we suppose that the vector field F does not de- 
pend on t, but the proof remains unchanged in the non- autonomous case. We 
remove also the dependence in t and n for notational convenience and intro- 
duce F* and F{x,9*). The first order condition VgR^{9) = implies that 
^l{D2F{x,9)y{i - F{x{t),9))wdt = 0. This gives 

{p2F{x, 9)^ ^ ((£ ~x')+F* - F{x, 9*) + F(£ , 9*) - F{x, 9))wdt = 
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and 



(^D2F{xJ)^ {{3: - x') + DiF{i* ,e*){x* - x) + D2F{x,e*){e* - e))w dt = 

with x* and 9* being random points between x* and x, and 9* and 9 respectively. 
We introduce D2F = D2F{x, 9), and an asymptotic expression for {9* — 9) is 

{9*~9) [ dIf^ D2F{x,9*)wdt = - / D^F^ {i - x' )w dt 

JQ Jo 



[ dIf^ DiF{i*,9*){x* - x)wdt. 
Jo 



It suffices to consider the convergence in law of the random integral Hn = 
Jg^(£)2^"*) {{x ~ x') + DiF*{x* — x)^wdt because the random variable 

G„ = y D2F^(^{i;-x') + DiF{i*,9*)ix* ~x)^wdt 

is such that G„ — Hn ^ in probability (in the sense) , moreover we have the 

convergence in probability of D2F D2F{x, 9*)dt to J*, the Hessian of R^. 

Indeed; we consider the map V : {x,9) 1-^ (t 1-^ D2F{x(t),9)) defined on 
C([0, 1],/C) X e (with the product Hilbert metric) with values in C([0, 1],R"^p) 
(with the norm \\A\\2 = Tr{A^ {t)A(t))dt). Since D2F is Lipschitz in (x, 9), 
the functional map I? is a continuous map. and we can claim by the continu- 
ous mapping theorem that the random functions D2F and t s- D2F{x{t),9*) 
converge in probability (in the L^sense) to D2F*. As a consequence, H-Da-fjla 
converges in probability to ||-D2-F'*||2 so it is also bounded, and \\D2F{x,9*) — 
D2F*\\2 in probability. This statement is also true for all entries of these 
(function) matrices, which enables to claim that all entries of the matrix 

^ (^dIfY {D2F{x,9*) - D2F*)w{t)dt 

tend to zero in probability (by applying the Cauchy-Schwarz inequality com- 
ponentwise). Moreover, we have convergence in probability of each entry of 
j'^{D2Fy D2F*wdt to the corresponding entry of j'^ {D2F*)^ D2F*dt (conse- 
quence of the convergence of D2F{x, 9) to D2F* in the sense), which implies 
finally that 

^ [D2F{x,9)y D2F{x,9*)wdt ^ r . 

By the same arguments and by using the fact that DiF is also Lipschitz in 
{x,9), we have convergence of the matrix G„ — Hn to in probability. The 
asymptotic behavior of {9n — 9*) is then given by the random integral 

J*-^ ^ {D2F*f (^{i^x') + DiF*{x* -x)'^wdt. (3.1) 
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As this functional is linear in (x — x') and (x* — x), this is equivalent to study 
the convergence of the functional T{x) to r(a::*), where 

T{x)= [ {D2F*f x(t)wdt+ [ D2F*^DiF*x(t)wdt. 
Jo Jo 

Since D12F and ib exist, we can integrate by part the first term of the right-hand 
side, 

{D2F*{t)f x{t)w{t)dt = [D2F*^xw]l - / -{D2F*{t)w{t))x{t)dt. 



As a consequence, V is the sum of the linear functionals Fs.u,, r^.^, introduced 
before: 



T{x) = j \yD2F*' DiF*w{t)--{D2F*{t)w{t))jx{t)dt 
+ L>2F*T(l)a;(l)w(l) - D2F*^ {Q)x{Q)w{Q) . 
Finally, by linearity we can write 

e-e* ^ {Ts{x) - Ts{x*) + rb,„(i) - rb,„(x*)) + op(i). 



□ 



Proposition 3.1 shows that two-step estimators behave asymptotically as the 
sum of two plug-in estimators of linear operators: a smooth functional Tg^w 
and a non-smooth functional Fb tu, the latter being a weighted evaluation func- 
tional (at the boundaries). It is well-known that the asymptotic behavior of 
these plug-in estimators are of different nature: Stone [35] showed that the best 
achievable rate for pointwise estimation over Sobolev classes is jT,"/2a+i (where 
Q is the number of continuous derivatives of x), whereas integral functionals 
/ h{t)x{t)dt can be estimated at a root-n in a wide variety of situations, e.g. 
Bickel and Ritov [4]. In particular, we will show in the next section that rs^u,(a;*) 
can be estimated at a root-n rate with regression splines. A direct consequence 
of this asymptotic consequence is that two-step estimators does not achieve the 
optimal parametric rate in generality (with whatever weight function w). Hence, 
two-step estimators used in applications computed with a constant weight func- 
tion have degraded asymptotic properties. Moreover, the classical bias- variance 
analysis of pointwise nonparametric estimation enables to give additional in- 
sight in the behavior of x. Indeed, it is well known that the quadratic error 
yt e [0, 1], E[{x{t) - x*{t))'^] = b{t)'^+v{t), where b is the bias function and v is 
the variance function. Although Tg^^{xn) is also a biased estimator of u,(a;*), 
this bias can converge at a faster rate than root-77. (under appropriate conditions 
on Xn), so that the bias of 6'„ is mainly due to the bias terms 6(0) and 6(1). 
This remark strongly motivates the use of nonparametric estimators with good 
boundary properties for reducing the bias of 0: a practical consequence of propo- 
sition 3.1 is that local polynomials should be preferred to the classical kernel 
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regression estimator because local polynomials have better boundary properties 
at the boundaries of the interval [13]. Nevertheless, a neat approach to the ef- 
ficiency of two-step estimators is to restrict to the case of a weight function w 
with boundary conditions w{0) ~ w{l) = 0, so that Tb^^ does vanish and the 
two step estimator is then asymptotically equivalent to the smooth functional 

i-C 

- 6* =Ts,Ui) ~T,^^{x*) + op{l). 

Then, it suffices to check that the plug-in estimator rs(a;) is a root-n rate esti- 
mator of r(a;*), which depends on the definition of Xn- We detail then essential 
properties of regression splines in the next section, and derive the desired plug-in 
property. 

3.2. Two-step estimator with least squares splines 

A spline is defined as a piecewise polynomial that is smoothly connected at its 
knots. For a fixed integer fc > 2, we denote S(^, k) the space of spline functions 
of order k with knots ^ = (0 = < < ■ ■ ■ < = 1): a function s in §(4, fc) 
is a polynomial of degree fc — 1, on each interval [^i, Ci+i]; i ~ 0, . . . , L and s 
is in C^~^ . S(^, fc) is a vector space of dimension L + k and the most common 
basis used in applications arc the truncated power basis and the B-splinc basis. 
The latter is usually used in statistical applications as it is a basis of compactly 
supported functions which are nearly orthogonal. B-splines can be defined re- 
cursively from the augmented knot sequence t = {tj, j = 1, . . . , L + 2k) with 

Tl = • • • = Tfc = 0, Tj+k = Cj, j = 1, . . . , L and TL+k+l = • • • = TL+2fe = 1. 

We note Bi ^', the i*^ B-spline basis function of order fc' (1 < fc' < fc) with the 
corresponding knot sequence t. The B-spline basis of order fc' = 1, . . . , fc are 
linked then by the recurrence equation: 

Vi=l,...,L + 2fc-l,Vie [0,1], B,s{t) = l[r,,r,^,]{t) 

and Vi = 1, . . . , L + 2fc - fc', Vfc' = 2, . . . , fc, Vi G [0, 1], 

A (univariate) nonparametric estimator a;„ is then computed by classical 
least-squares (the so-called least-squares splines or regression splines). Most of 
the time, cubic splines (fc = 4) are used and the essential assumptions are made 
on the knots sequence ^ in an asymptotic setting: it is supposed that the number 
of knots L„ tends to infinity with a controlled repartition. The well-posedness 
of the B-spline basis (and corresponding design matrix) is critical for the good 
behavior of the regression spline, and the needed material for the asymptotic 
analysis can be found in [44] . We define below the nonparametric estimator that 
we use. 

We have n observations ?/i , . . . , ?/„ corresponding to noisy observations of 
the solution of the ODE (1.1), and we introduce Qn the empirical distribu- 
tion of the sampling times times ti,...,t„. We suppose that this empirical 
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distribution converges to a distribution function Q (which possesses a density 
q w.r.t Lebesgue measure). We construct our estimator Xn of x* as a func- 
tion in the space §(^„,fc) of spUne functions of degree k and knots sequence 
= {^oAii ■ ■ ■ i^i„+i) {Kn is the dimension of fc)). The knots sequence 
is chosen such that maxi<i<L„ — hi\ = o(L~^), |^|/ min^ /i^ < AI where 
hi = {£,i+i —60 and 1^1 = supj hi is the mesh size of ^. As a consequence, we have 
1^1 = 0{L~^). Like in Zhou et al. [44], we suppose that we have convergence of 
Qn towards Q at a rate controlled by the mesh size, i.e. 

sup \Qnit)-Qit)\^om. (3.2) 
te[o,i] 

The estimator i„ we consider is written componentwise in the basis of B-splines 
(i?i, . . . , Bk„) 

\/i = 1, . . . ,d, Xn,i =^CtkBk (3.3) 

or in matrix form a;„ = C„B with the vector- valued function B = {Bi, . . . , Bk„)^ 
and the d x Kn coefficient matrix C„ = {c"f^)i<i.k<d,Kr, (and column vectors 
Ci,n = (cii, . . . , Ci^KrS' S R^"). We stress the fact that all the components Xn,i 
are approximated via the same space, although it may be inappropriate in some 
practical situations but it enables to keep simple expressions for the estimator. 
The fact that we look for a function in the vector space spanned by B-splines, 
puts emphasis on the regression interpretation of the first step of our estimating 
procedure. The estimation of the parameter C„ can be cast into the classical 
multivariate regression setting 

Y„ = B„C^ + e„, (3.4) 

where Y„ = (Yi . . . Y^) is the nxd matrix of observations, e is the nxd matrix 
of errors, is the Kn x d matrix of coefficients and B„ = (Sj(ii))i<i<„_i<j</f^ 
is the design matrix. We look for a function close to the data in the LP' sense, 
i.e. we estimate the coefficient matrix C„ by least-squares 

n 

Ci.n = arg min 'S^iVij - B(tj)^c)^, i = 1, . . . , d, 
J=l 

which gives the least squares estimator C„ = (B^B„)+B^Y where (•)+ denotes 
the Moore-Penrose inverse. We have 

Vie {l,...,d},VtG [0,1], i,,„(i) -B^(i)c,.„, 

where = (B^B„)+B^Yi. Finally, we introduce the projection matrix Pb.u = 
B„(B^B„)+B^. We will use the notation x < y to denote that there exists a 
constant Af > such that x < My. 

General results given by Huang in [21] ensure that Xn — >■ x* in probability for 
sequences of suitably chosen approximating spaces S(fc,^„) with an increasing 
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number of knots. Indeed, corollary 1 in [21] enables us to claim that if the 
observation times are random with Q{B) > cX{B) (0 < c < 1 and A(-) is the 
Lebesgue measure on [0, 1]), the function x* is in the Besov space oo (with 
k > a — 1) and the dimension grows such that lim„ ^" ^" = then 

1 " / K 



i=l 



n 



Moreover, the optimal rate Op(n~^"/^^"+-'^)) (given by Stone [36]) is reached for 
Kn ~ n}^^'^°'~^^\ For this nonparametric estimator, it is possible to construct a 
consistent two-step estimator 0„ by minimization of wi^)- 

We need then a general result for the asymptotics of linear functionals of 
spline estimators. This can be seen as a special case of a general result derived 
by Andrews for series estimators [1]. First, we need to have a precise picture 
of the evolution of the basis (_Bi, . . . ,Bk„) as Kn oo and particularly the 
asymptotic behavior of the empirical covariance Gk^.u — ^(B^B„) and of the 
(theoretical) covariance Ga'„ = J^B{t)B{tydQ{t). From [44], we have that if 
Kn = o{n), 

yt e]0, 1], B^it) (B,TB„)"' Bit) = iB(t)TG^i B(t) +o(^^^ ■ (3.5) 

The analysis of section 3.1 gives interest in the asymptotic behavior of Taixn) 
where Fa is a linear functional T{x) = a{s)'^ x{s)ds where a is a function in 
Ci([0,l],M''). If X = C^B, T{x) = J^LicJli = Trace{C^j) with 7^ the 
columns of the K x d matrix 7 = B{s)A^ {s)ds. Hence, the asymptotic 
behavior is derived directly from the asymptotics of C„ and of matrix 7. By 
using the results from Andrews [l], we derive the asymptotic normality of this 
functional. For simplicity, we consider only the case d = 1, the extension to 
higher dimensions is cumbersome but straightforward. If the variance of the 
noise is a^, the variance of c„ is 

Vicn) = a^{BlBn)+ (3.6) 

and the variance of the estimator of the functional is 

Vn = ViTixn)) = a27T(B^B„)+7„. (3.7) 

Proposition 3.2 (Convergence of linear functionals of regression splines). 

Let (^„)n>i be a sequence of knot sequences of length Ln + 2, and Kn = 
dim(S(^„, fc)), with k < 2. We suppose that Ln ^ 00 is such that n^^^\^n\ ^ 
and n\^n\ 00. If a : [0,1] R is in , and x is in C", 2 < a < k, 
T{x) = /q a(s)x{s)ds then: 

(i) T{xn) — ^{x) = Op(n^^/^) and Y^(F(a;,i) — F(a::)) is asymptotically normal, 
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(ii) Vt e [0, l],Jnit) - x{t) = Op(n-i/2|ej-i/2)^ 

(iii) V{xnit))^^^'^{xnit) — x(t)) is asymptotically normal, t G [0, 1]. 

Proof. In order to prove the asymptotic normality of r(a;„) — r(x), we check 
the assumptions of theorem 2.1 of [1]. Assumption A is satisfied because the Ci's 
are i.i.d. with finite variance. For assumption B, since a is C^, the functional is 
continuous with respect to the Sobolev norm (or simply the sup norm). Ftom the 
approximation property of spline spaces [7], it is possible to construct a spline 
a = X]i="i CiiBi = Q;,[B € §(^„, k) such that \\a — a\\oo = 0{\$,\^). The quality 
of approximation of the functional Ta is directly derived: |ra(a;) — ra(a;)| = 
I /g (a — a)[s)x{s)ds\ < |^|||a;||oo- Hence, it suffices to look at the case a = a^B 
because Ta{x) — Taix) will tend to zero at faster rate than n^/^. We introduce 
the vectors 7„ = (TaiBi) . . . TaiEKj)^ , so we have jjjn = ol^ GJ^^Gr^ol > 
^min^K„ X \\ot\\2- Froui Icmma 6.2 of [44] (giving bounds on the eigenvalues of 
Gif^), we get 7^7„ > |^| ||a||2. Lemma 6.1 from [44] (about the equivalence ofL^ 
norm and Euclidean norm of spline coefficients) ensures that 7^7„ is bounded 
away from because 

mMl> I a\s)dQr,{s) 



hence liminf„7^7„ > and assumption B is satisfied. 

From (3.5), we get the behavior of the diagonal entries of Pb,„: 

Vz e [l..Ag, {PB,n\H = -B(e;)^G]^^B(e,) + o((n|||)-i) (3.8) 

we see that assumption C(ii) is true because B(^i)^G]^^B(^i) < ci||B(^i)||2|||~^ 
and ||B(^i)||2 < k (because the B-splines are bounded by 1 and only k of them 
are strictly positive) ensure that maxi(PB,„)M = 0((n|^|)'^) 0. It is clear 
that B„ is of full rank for n large enough. 

Since a < fc, it exists a sequence (a;„) G S(^„,A:) such that ||.t„ — a;*||oo = 
0(1C|") [7], hence 

^ II l|oo ^ 0- 

If we use again the spline approximation of the function a, we derive the following 
expression for 



In 



7„ = G^^Gjl^fiKOL- 



Yrom. lemma 6.2. of [44], we have cx^ Gj^Gj^^^Gxct > ol^ Gkol- As for 7,^7,1, 
we have 

liminf 7^ ( ) 7„ > 0, 

n \ n / 

which remains true when a is any smooth function in G^. 

— 1/2 

According to Andrews, we can conclude Vn (ra(i„) — Ta{x*)) ^ -/V(0, 1). 
We obtain an equivalent of the rate of convergence by the same approximation 
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as above 

14 = 7,T(B^B)~S„ 

n ' 

~ —cx^Gkoi 

n 

i.e. Vn ^ ^^"^^ and wc obtain finally that Vn ^ n~^. 

The technique used by Andrews for his theorem 2.1 gives also asymptotic 
normality of a;„(<) ~ B(t)^Ci^„. Wc have then 

Vt e [0, 1], V{xn{t)) = a2B(i)^(B,TB„)+B(t) 

and from (3.5) we get V{xn{t)) = ^B(t)^G^^B(i)+o(;^), so that V{xn{t)) - 
from lemma 6.6 in [44]. □ 

For deriving the asymptotics of the two-step estimator when regression splines 
are used in the first, wc just need to put the results of propositions 3.1 and 3.2 
together. 

Theorem 3.1 (Asymptotic normality and rates of the two-step esti- 
mator). Let F a C™ vector field w.r.t {0,x) (m > 1), such that DiF,D2F 
are Lipschitz w.r.t {9,x), and D12F exists. We suppose that the Hessian J* 
of the asymptotic criterion R^{9) evaluated at 9* is nonsingular, and that the 
conditions of proposition 2.1 are satisfied. 

Let Xn £ §(^„, k) a regression spline with k > 2, such that n^^^\^n\ ~* '^'^^ 
n|^„| 00, then the two-step estimator On = argming Rn^wi^) asymptotically 
normal and 

(i) «/w(0) = w(l) = 0, then (4 - 0*) = Op{n~'^^^) 

(ii) ifw{0)^Oorw{l)^0, </ien(0„-r) = Op(n-i/2||„|-i/2). 

In the case (ii), the optimal rate of convergence for the Mean Square Error is ob- 
tained for = 0(ni/(2™+3)) and we have then i§n-0*) = Op(n-("+i)/(2m+3))_ 

Proof. From proposition 3.1, wc have 

6-9* = (r,,,„(x) - r,,„(x*) + nA'x) - nA^l) + op(i)- 

so that we just have to apply proposition 3.2 to Ts_m{x) and Ti,^w{x). We can 
claim the asymptotic normality of y/n{Ts{xn)—^s{x*)) and of \rn\^\{Tb,w{x,i)— 
^b.wix*)) (normality is extended from scalar functional to multidimensional 
functional by the Cramer- Wold device). We have then two cases for the rate of 
convergence of 9n, depending on the values w(0), w(l). When w{0) = w{l) = 0, 
there is only the parametric part, but when Ffj ^, does not vanish the nonpara- 
metric part with rate -y/ remains. We can determine the optimal rate of 
convergence in the mean square sense by using the Bias - Variance decompo- 
sition for the evaluation functional - 9*\\^ = Op {{E{xn{t)) - x{t)Y) -\- 
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Op{Var{xn{t))). Theorem 2.1 of [44] gives E {{xn{t) - x* {t) f) = 0(|^,J"+^) 
(because x* is C™+^) and Far(x„(<)) = Op(n^^||„|^^) so the optimal rate is 
reached for ||„| = (^(n-i/^^^+s)) and is 0(„-(2m+2)/(2m+3))^ □ 

Remark 3.1 (Random observational times). The asymptotic result given 
for the deterministic observational times < < • • ■ < ctn < 1 remains true 
when they are replaced by realizations of some random variables Ti , . . . , r„ as 
long as the assumptions of the two previous propositions are true with proba- 
bility one. Andrews gives some conditions (theorem 2) in order to obtain this. 
It turns out that in the case of Ti, . . . , r„ i.i.d. random variables drawn from 
the distribution Q, it suffices to have < rf with < r < 1. In particular, as 
soon as m > 1, the conclusion of proposition 3.1 holds with probability one for 
the optimal rate Kr, ni/(2m+3)_ 

We have restricted theorem 3.1 to regression splines in order to a have precise 
and self-content statement of conditions under which the asymptotics of the 
two-step estimator is known. In particular cubic splines gives root-n consistent 
estimators for smooth differential equation (m > 2) and appropriate weight 
function. 

The main point of our study is that the asymptotic normality and parametric 
rate are derived from the behavior of the smooth functional Fs which can be 
derived for series estimators (polynomials or Fourier series) from [1]. Moreover, 
the same theorem can be derived for kernel regression (Nadaraya- Watson) by 
using the results on plug- in estimators of Goldstein and Messer [18]. More gen- 
erally, the same theorem may be obtained for other nonparametric estimators 
such as local polynomial regression, orthonormal series, or wavelets. 

4. Experiments 

The Lotka-Volterra equation is a standard model for the evolution of prey- 
predator populations. It is a planar ODE 

fi ^ax + bxy ,^ 
1 2/ = cy + dxy 

whose behavior is well-known [19]. Despite its simplicity, it can exhibit conver- 
gence to limit cycles which is one of the main features of nonlinear dynamical 
systems, which has usually a meaningful interpretation. Due to this simplicity 
and the interpretability of the solution, it is often used in biology (population 
dynamics or phenomena with competing species) , but the statistical estimation 
of parameters a, 6, c, d in 4.1 has not been extensively addressed. Neverthe- 
less, Varah (1982) illustrates the two-step method with cubic splines and knots 
chosen by an expert on the same model as (4.1). Froda et al. (2005) [IG] have 
considered another original method exploiting the fact that the orbit may be 
a closed curve for some values of the parameters. In this section, we will con- 
sider a slight generalization and reparametrization of the previous model which 
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consists in adding the squared terms x'^ and y^: 

(x ^ x{aix + a2y + as) 
\y = yihx + b2y + 63) 

We use the two-step estimator obtained by minimizing i?^ ^ (9) in order to illus- 
trate the consistency and the asymptotic normality of the estimator proved in 
the previous section. In particular, we will consider two estimators: one obtained 
with a uniform weight function w ~ I, and a second one with a weight vanish- 
ing at the boundaries. According to theorem 3.1, it gives two different rates 
of convergence: we consider then the influence of the number of observations 
n = 20, 30, 50, 100, 200, 500, 1000. We consider also a small number of obser- 
vations (n = 20, 30) as the nonparametric procedure can give poor results in 
the small-sample case, and the simulation can give an insight into the expected 
properties in this practical situation. As the reconstruction of the solution in 
the first step is critical, we consider the estimation of an ODE with two different 
parameters 9i with oi = 0, 02 = —1.5, = 1, &i = 2, 62 = and 63 = —1.5, 
and 02 with ai — 0, a2 ~ —1.5, 03 = 1, 61 = 2, 62 = 1 and 63 = —1.5. In 
both cases, the parameters of the quadratic terms ai and 62 are supposed to be 
known, so that 4 parameters have to be estimated from noisy discrete observa- 
tions of the solution of the ODE. We suppose also that the initial conditions are 
not known, so that they are nuisance parameters that are not estimated by our 
procedure. These two parameters 0i,02 gives rise to two different qualitative 
behavior of the solution as it can be seen in figures 1, 2 and it gives a view of 
the influence of their shapes on the inference. The shape of the solution has two 
consequences: the identifiability of the model through the asymptotic criterion 
i?^, and the difficulty of the first step (if the curve is rather wiggly or flat). 
The data are simulated according to 2.1 where e is a Gaussian white noise with 
standard-deviation a = 0.2, and the observation times arc deterministic and 
equal to tj = j = 0, . . . , n - 1. 

We define now exactly the two-step estimator that have been used in the 
experiments. In the first step, we have used a least square cubic spline with an 
increasing number of knots Kn- The selection of Kn and of the position of the 
knots is a classical problem in the spline regression literature. The study of the 
asymptotics gives only the rate of Kn (or |^„|) in order to optimize the rate of 
convergence, but one has to find practical devices in order to compute a correct 
estimator. For instance, this choice was left to the practitioner in Varah's paper, 
but due to the intensive simulation framework for the estimation of the bias and 
variance of two-step estimators, we have used an adaptive selection of the knots 
in order to obtain an automatic first step with reliable and stable approximation 
properties. This procedure permits then to focus on the quality of the two-step 
estimator, and not in particular on the first step. We have considered 3 different 
basis of splines with imiformly spaced knots £_j = jj^^, j ~ 0, ■ ■ ■ , K^- 

• if n = 20, 30, Kn = 15, 

• if n = 50, Kn = 20, 

• if ?i = 100, 200, 500, 1000, Kn = 30. 
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(a) Trajectories (time evolution) 










(b) Phase plane 



Figure 1 . Solution of Lotka- Volterra system in the phase plane for 61 and x{0) = 1 et y{0) = 2. 



and we have selected the knots by applying a variable selection approach to 
the truncated power basis (instead of the B-spline basis) defined as the set of 
functions 1, t, . . . , t''"^, {t - ■■ ■ ,{t - The knots arc selected by 

minimizing the Generalized Cross Validation criterion (GCV): 




(a) Trajectories (time evolution) 




(b) Phase plane 



Figure 2. Solution of Lotka- Volterra system in the phase plane for 62 andx{0) = 4 ety{0) = 2. 
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Table 1 

Case 1: Mean and standard deviation of the two-step estimator with a weight function 
vanishing at the boundaries (6i,w) o,nd with uniform weigth (d\). True parameter is 

01 = (-1.5, 1, 2, -1.5) 



n 


mcan(^?^ ^ ) 


mcan(6Ji ) 


std(9i „,) 


std(Si) 


20 


(-0.97, 0.68, 0.99, -0.75) 


(-0.87, 0.57, 0.95, -0.77) 


(0.13, 0.12, 0.17, 0.17) 


(013, 0.12, 0.17, 0.17) 


30 


(-0.95, 0.71, 1.12, -0.95) 


(-0.97, 0.72, 1.10, -0.97) 


(0.10, 0.10, 0.13, 0.12) 


(0.11, 0.10, 0.13, 0.12) 


50 


(-1.12, 0.81, 1.41, -1.14) 


(-1.11, 0.81, 1.38, -1.16) 


(0.11, 0.10, 0.14, 0.13) 


(0.11, 0.10, 0.14, 0.13) 


100 


(-1.28, 0.88, 1.73, -1.34) 


(-1.26, 0.87, 1.71, -1.36) 


(0.14, 0.11, 0.18, 0.16) 


(0.13, 0.10, 0.19, 0.15) 


200 


(-1.37, 0.92, 1.83, -1.4) 


(-1.35, 0.91, 1.81, -1.41) 


(0.09, 0.07, 0.12, 0.1) 


(0.10, 0.07, 0.13, 0.1) 


500 


(-1.42, 0.95, 1.90, -1.44) 


(-1.42, 0.95, 1.90, -1.45) 


(0.06, 0.05, 0.08, 0.06) 


(0.07, 0.05, 0.09, 0.06) 


1000 


(-1.45, 0.97, 1.94, -1.46) 


(-1.45, 0.96, 1.93, -1.46) 


(0.04, 0.03, 0.06, 0.05) 


(0.045, 0.03, 0.07, 0.05) 



Table 2 

Case 1 : Mean Squared Errors of the weighted and unweighted parametric estimators versus 
the Mean Squared Error of the nonparametric estimators of the 2 curves 



n 


RMSE(0i,„) 


RMSE(0i) 


RMSE(0) 


20 


1.41 


1.5 


(1.32, 1.56) 


30 


1.21 


1.21 


(1.05, 1.21) 


50 


0.83 


0.84 


(0.83, 0.9) 


100 


0.47 


0.49 


(0.54, 0.55) 


200 


0.3 


0.32 


(0.38, 0.4) 


500 


0.18 


0.19 


(0.25, 0.26) 


1000 


0.12 


0.13 


(0.18, 0.20) 



where ICm is a subset (of size to) of {£,j,j = 0, . . . , Kn}, and d{lCm) is the effective 
number of parameters. As in [15, 27], we use diJCm) = 3to + 1, and we use an 
heuristic for considering efficiently the various subsets, which is based on the 
ehmination of the less informative knots (for GCV), and the addition of the more 
informative knots in order to minimize the GCV criterion (ElimAdd procedure in 
[27]). This procedure is not optimal but it gives a simple and reliable adaptive 
nonparametric estimators. Other knots selection procedure would have been 
based on free-knot splines [S, 'AT]. Experiments have been also performed in the 
non-adaptive case [■ ■]. 

The quality of the nonparametric estimation procedure is measured by the 



Table 3 

Case 1: Minima of the criteria R, 



n 


-^n,^J;(^l,^^-') 


Rl{9i) 


20 


(6.86, 9.35) 


(11.4, 15.4) 


30 


(3.44, 4.11) 


(4.25, 5.25) 


50 


(2.41, 2.97) 


(2.94, 3.87) 


100 


(1.86, 2.46) 


(2.31, 3.67) 


200 


(1.25, 1.62) 


(1.62, 2.68) 


500 


(0.70, 0.96) 


(0.96, 1.56) 


1000 


(0.47, 0.64) 


(0.64, 1.0) 
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Table 4 

Case 2:Bias and standard deviation of tvio-step estimators with a weight function vanishing 
at the boundaries (82,w) ciiT-d with uniform weight (82) ■ True parameter is 
02 = (-1.5, 1, 1.5, -1.5) 



n 


mcan02 w 


mcan02 


std{92,,„) 


std(S2) 


20 


(-0.88, 0.56, 0.48, 0.03) 


(-1.18, 0.74, 0.22, 0.38) 


(0.27, 0.18, 0.36, 0.52) 


(0.28, 0.20, 0.35, 0.49) 


30 


(-1.18, 0.77, 0.72, -0.34) 


(-1.42, 0.92, 0.46, 0.04) 


(0.25, 0.18, 0.24, 0.35) 


(0.22, 0.16, 0.19, 0.26) 


50 


(-1.17, 0.78, 1.15, -0.94) 


(-1.58, 1.03, 0.99, -0.69) 


(0.18, 0.13, 0.33, 0.49) 


(0.29, 0.20, 0.44, 0.64) 


100 


(-1.39, 0.92, 1.12, -0.93) 


(-1.45, 0.95, 1.01, -0.78) 


(0.18, 0.12, 0.33, 0.48) 


(0.18, 0.19, 0.53, 0.77) 


200 


(-1.43, 0.95, 1.29, -1.19) 


(-1.54, 1.0, 1.17, -1.00) 


(0.13, 0.09, 0.25, 0.37) 


(0.24, 0.16, 0.43, 0.63) 


500 


(-1.45, 0.97, 1.43, -1.40) 


(-1.58, 1.05, 1.34, -1.26) 


(0.08, 0.05, 0.16, 0.24) 


(0.19, 0.12, 0.31, 0.45) 


1000 


(-1.47, 0.98, 1.47, -1.45) 


(-1.58, 1.05, 1.41, -1.37) 


(0.05, 0.04, 0.10, 0.14) 


(0.14, 0.09, 0.20, 0.29) 



Root Mean Squared Error (RMSE) 

RMSE = ( ^ {xj ^x*f {t)dt^ 



1/2 



and is given in tables 2, 5, so that the MSE of the two-step estimator ean be 
related directly to the quality of the first step. 

In a general manner, the second step can be addressed by using whatever 
(nonlinear) optimization procedures. Nevertheless, in the case of the Lotka- 
Volterra model, it turns out that we have a closed-form expression for 9. Indeed, 
the criterion R\ ^ for the first dimension can be written 



20 



{x — x{aix + a2y + a3))^w{t)dt 



XI ( ^^^^j) ^ [a-ixitj) + a2y{tj) + 03) 



where the criterion is approximated by a Riemann sum (Aj is the size of the 
subdivision, supposed to be uniform) and Wj = Ajw{sj)x{sj) is a new weight 
function. Hence, the estimator is the solution of a weighted least-square program 
whose solution exists, is unique and has a known expression. This means that 
the two-step estimator remarkably furnishes a fast and reliable procedure, where 
there is no problem of local minima, although we deal with a nonlinear regression 
model. 

The quality of the two-steps estimator has been evaluated by computing its 
mean and standard deviation through a Monte-Carlo study with 1000 indepen- 
dent drawings. The results are shown in tables 1, 4 for the models with 9i and 
02, and illustrates the consistency of the two-step procedures as n grows to in- 
finity. Wc compute with the same nonparametric estimator x, and either with 
a uniform weight function w = 1, or boundaries vanishing weight. In our exper- 
iment, we use a piecewise linear function with w{0) = 0, w{l) = 1, w(19) = 1 
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Table 5 

Case 2: Mean Squared Errors of the weighted and unweighted parametrics estimators versus 
the mean squared of the nonparametric estimators of the 2 curves 



n 


MSE(e2,„) 


MSE(e2) 


MSE(0) 


20 


2.0 


2.34 


(0.93, 0.97) 


30 


1.48 


1.88 


(0.68, 0.73) 


50 


0.91 


1.14 


(0.57, 0.59) 


100 


0.83 


1.18 


(0.44, 0.40) 


200 


0.52 


0.87 


(0.53, 0.34) 


500 


0.28 


0.56 


(0.20, 0.20) 


1000 


0.17 


0.37 


(0.15, 0.14) 



Table 6 

Case 2: Minima of the criterions 



n 




Rim 


20 


(3.66, 1.99) 


(5.8, 2.80) 


30 


(2.35, 1.16) 


(3.4, 1.57) 


50 


(2.01, 1.55) 


(3.46, 2.29) 


100 


(0.89, 0.40) 


(1.60, 0.69) 


200 


(0.52, 0.34) 


(1.20, 0.61) 


500 


(0.30, 0.19) 


(0.95, 0.36) 


1000 


(0.18, 0.11) 


(0.66, 0.21) 



and ti;(20) = (and w{t) — 1, t E [1, 19]). Wc have also a picture of the behavior 
of the estimator for small sample size and it appears that the two-step estimator 
(weighted or unweighted) is biased, for cases 1 and 2, and the bias diminishes 
significantly when n < 100. An important feature is that the weighted estimator 
is better behaved than the unweighted one, in both experiments and for small n 
and big n. Indeed, the standard deviation of Oi^^ is equal (in case 1) or smaller 
(in case 2) to the one of 9i, but the main difference between the two estimators 
comes from the bias term which induces a bigger RMSE for the unweighted 
estimator, see tables 2, 5. This difference comes from the presence of bias term 
in the evaluation functionals in the unweighted estimator (as it is emphasized 
in theorem 3.1), which can be particularly important at the boundaries. In case 
2, this difference is very important, due to the flat part for t > 10. 

From the expression of the i?^ ^, it is clear that the quality of the 9 is di- 
rectly related to the distance ||.t — as one can see in tables 2, 5 but our 
experiment shows that it is not sufficient. First, one need also to estimate cor- 
rectly the derivative of the solution, and second a low minimum of the criterion 
i^n wi^n,w)) does not indicate a good estimator. This difficulty arises when we 
compare the distance of the nonparametric estimator and the value of the cri- 
terion (see tables 3, 6). For instance, we have a better nonparametric estimator 
and criterion in case 2 than in case 1 (e.g. n = 100), but we have a lower RMSE 
in case 1. Hence, case 2 appears as a more difficult model to estimate, and the 
shape of the solution has a direct influence on the asymptotic criterion and 
on the ability to approximate it. We have also controlled the normality of the 
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two-step estimator with a univariate Kolmogorov-Smirnov test for the four pa- 
rameters. The test used was adapted to the case of unknown mean and variance 
(via Lilhefors procedure), and the normahty assumption cannot be rejected (at 
a level of 5%) as soon as n — 20 for all parameters. 

5. Conclusion 

We have proposed a new family of parametric estimators of ODE's relying on 
nonparametric estimators, which are simpler to compute than straightforward 
parametric estimators such as MLE or LSE. The construction of this parametric 
estimator puts emphasis on the regression interpretation of the ODE's estima- 
tion problem, and on the link between a parameter of the ODE and an associated 
function. By using an intermediate functional proxy, we expect to gain infor- 
mation and precision on likely value of the parameters. We do not have studied 
the effect of using shape or inequality constraints of the estimator but it 
might be valuable information for the inference of complex models, cither by 
shortening the computation time (it gives more suitable initial conditions) or 
by accelerating the rate of convergence of the estimator thanks to restriction to 
smaller sets of admissible parameter values. 

We have particularly studied the case ^(6*), but other M-cstimators such as 
the one obtained from i?^ ^{9) may possess interesting theoretical and practical 
properties such as robustness. This could be particularly useful in the case of 
noisy data which can give oscillating estimates of the derivatives of the function. 

We have given a general study of the two-step estimator (consistency, asymp- 
totic expansion) , and we have shown that the weight function used in ^ 
controls dramatically the rate of convergence of the estimator, and that this 
method can furnish root-n consistent estimators. We have then provided a de- 
tailed account of the asymptotic behavior of spline-based estimators. This choice 
is mainly due to the practical interest and the wide use of splines, but the con- 
clusions may remain the same for usual nonparametric estimators (series esti- 
mators, kernel estimators). Indeed, we have shown that the asymptotic behavior 
of the two-step estimator comes from the behavior of a linear functional of the 
regression function. 

In the experiments, we have illustrated the influence of the weight function, 
as the influence of the solution in the quality of the estimators. In particular, 
we have shown that the approximation quality of the solution is not sufficient 
in order to have a good estimator, and that it depends on the shape of the 
solution. 
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